Skip to content

fix: decode answers file content explicitly as UTF-8#1970

Merged
sisp merged 1 commit intocopier-org:masterfrom
sisp:fix/umlaut-in-answers
Feb 17, 2025
Merged

fix: decode answers file content explicitly as UTF-8#1970
sisp merged 1 commit intocopier-org:masterfrom
sisp:fix/umlaut-in-answers

Conversation

@sisp
Copy link
Member

@sisp sisp commented Feb 17, 2025

I've fixed a string encoding/decoding bug on Windows that broke updating a project with recorded non-ASCII character answers.

According to my investigation, pathlib.Path.read_text() calls io.text_encoding(), and when no encoding value is provided (i.e., encoding=None) then a default encoding is determined. When I reproduced the error on a quite clean Windows 11, sys.flags.utf8_mode == 0, so io.text_encoding(encoding=None) returns "locale" and locale.getencoding() returns "cp1252". At the same time, the to_nice_yaml filter passes the serialized YAML string through to_text() without passing an explicit value to the encoding argument which defaults to "utf-8". Hence, when the answers file content is rendered as {{ _copier_answers|to_nice_yaml }}, it is always encoded as UTF-8, but on Windows 11 its content was decoded as CP-1252.

The solution is to always decode the answers file content as UTF-8.

I believe we'll need to fix this problem also for loading from external data files and for loading user settings. I'll send follow-up PRs if/where needed once this one has been merged.

Fixes #1963.

@codecov
Copy link

codecov bot commented Feb 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.06%. Comparing base (3ae6b78) to head (0d702f8).
Report is 126 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1970   +/-   ##
=======================================
  Coverage   98.05%   98.06%           
=======================================
  Files          53       53           
  Lines        5552     5567   +15     
=======================================
+ Hits         5444     5459   +15     
  Misses        108      108           
Flag Coverage Δ
unittests 98.06% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sisp sisp merged commit 21b486c into copier-org:master Feb 17, 2025
22 checks passed
@sisp sisp deleted the fix/umlaut-in-answers branch February 17, 2025 14:53
@sisp
Copy link
Member Author

sisp commented Feb 19, 2025

I believe we'll need to fix this problem also for #1880 and for #1940. I'll send follow-up PRs if/where needed once this one has been merged.

I realized that the user settings file doesn't seem to be affected by this encoding/decoding issue because it isn't shared across developer machines with potentially different operating systems unlike the answers file. Hence, when the user settings file is encoded with the system's default encoding scheme, there should be no problem, as Copier is currently using that same encoding scheme to decode the file content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Copier is messing up Umlaut on template update

2 participants